Pytorch

introduction

load data

import pandas as pd

pd.read_csv(filepath, header=None, delim_whitespace=True)

data preprocess

model layout

mlp model

GRU model

CNN model

train model

train and test

train, validate, and test

import torch
import torch.nn as nn
import torchvision.transforms as transforms
import torch.nn.functional as F

# Device configuration
device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')

Vitis HLS 2022.1 tools install

Set up on a vertual environment with:

get necccesary libraries

Run the following on terminal

sudo apt update
sudo apt upgrade
sudo apt install libncurses5
sudo apt install libinfo5
sudo apt install libncurses5-dev libncursesw5-dev

Download Vitis HLS and install

You need to register a Xilinx account first, which requires email verification.

After that, go to Xilinx tool download page. Download Xilinx Unified Installer 2022.1: Linux Self Extracting Web Installer.

Gownload Xilinx bin

By default, your download file should be at ~/Download/ directory. Execute it:

chmod +x Xilinx_Unified_2022.1_<>_Lin64.bin
sudo Xilinx_Unified_2022.1_<>_Lin64.bin

This should open up an installation interface. Enter your account and choose download and install now. Kepp all default installation options. Accept every license agreement. The installation should take around five hours. Make sure you have stable internet connection.

Set environment

Execute sudo nano ~/.bashrc command and add the following line to your file:

source /tools/Xilinx/Vitis/2022.1/settings64.sh

Close and start a new Terminal

Apply WebPack Licence

Open Vivado with vivado command. At top bar select Help -> Manage Licencce...

Open Licence Manager

Select Obtain License. Select Get Free ISE WebPack, ISE/Vivado or PataLinux Licenses. Click Connect Now.

Connect to License website

You should be directed to license generation page. Log in. Select WebPack License and generate it (if you can't find it, press ctrl + '-' to shrink down text). Keep all default settings.

Select WebPack Licence

After that, switch to manage license tag. You can either download it or send it to your email.

Download License

Once you download your license file (.lic), go back to Vivado license manager. Select load License. Click Copy License.... Select your license file. Click Open.

Load license

Once done, youu can select View License Status to check your license.

Check license

FINN

Be sure to have a licensed Vitis first.

install Docker engine

Reference

Execute the folowing command in terminal

sudo apt-get update
sudo apt-get install ca-certificates curl gnupg lsb-release
sudo mkdir -p /etc/apt/keyrings
curl -fsSL https://download.docker.com/linux/ubuntu/gpg | sudo gpg --dearmor -o /etc/apt/keyrings/docker.gpg
echo "deb [arch=$(dpkg --print-architecture) signed-by=/etc/apt/keyrings/docker.gpg] https://download.docker.com/linux/ubuntu $(lsb_release -cs) stable" | sudo tee /etc/apt/sources.list.d/docker.list > /dev/null
sudo apt-get update
sudo apt-get install docker-ce docker-ce-cli containerd.io docker-compose-plugin

Verify docker engine:

sudo service docker start
sudo docker run hello-world

config Docker to run without root

Reference

FINN require Docker to run without root privilege

sudo groupadd docker
sudo usermod -aG docker $USER

After that, restart your virtual machine. Verify your setting:

docker run hello-world

set environment

Execute sudo nano ~/.bashrc command and add the following line to your file:

export FINN_XILINX_PATH=/tools/Xilinx
export FINN_XILINX_VERSION=2022.1
export FINN_HOST_BUUILD_DIR=~/finn/build

export PYNQ_BOARD=Pynq-Z2

FINN environment

install FINN

Via git:

cd ~
sudo apt install git
git clone https://github.com/Xilinx/finn/
cd finn
mkdir build

execute ./run-docker.sh quicktest to see if your settings is correct. You should see at most one error in the end of the test.

PYNQ board first time setup

install bitstring

Connect your PYNQ board to your virtual mechine. By default, its IP address is 192.168.2.99, its username is xilinx, an d its password is xilinx. Install bitstring library on PYNQ. If your PYNQ board is able to connect to the interenet, run sudo python3.6 -m pip install bitstring on PYNQ; If not, download bitstring tar.gz . Upload it on PYNQ, then execute sudo python3.6 -m pip install <your bitstring tar.gz file>

setup SSH key

Execute ./run-docker.sh in your finn directory to launch a finn Docker container. Run the following command on it:

cd ssh_keys
# Keep everything default in the next command
ssh-keygen
ssh-copy-id -i id_rsa.pub xilinx@192.168.2.99

Test with ssh xilinx@192.168.2.99 command. You shouuld be able to log in without password.

open finn jupyter notebook

Open a new Terminal. go to finn directory. Execute ./run-docker.sh notebook. After about 10 minutes. You should get an URL at the bottom. "Ctrl click" to open finn jupyter interface.

finn jupyter link

example: deploy an MLP model on PYNQ board with HLS

The following are executed on finn jupyter.

Train an MLP model with Brevitas library

Refefrence: end2end_example/cybersecurity/1-train-mlp-with-brevitas.ipynb. You can go there and run these command for yourself.

Remember to import onnx beffore Pytorch.

import onnx
import torch

Get pre-quantized dataset

! wget -O unsw_nb15_binarized.npz https://zenodo.org/record/4519767/files/unsw_nb15_binarized.npz?download=1

Get train TenserDataset and test TenserDataset

import numpy as np
from torch.utils.data import TensorDataset

def get_preqnt_dataset(data_dir: str, train: bool):
    unsw_nb15_data = np.load(data_dir + "/unsw_nb15_binarized.npz")
    if train:
        partition = "train"
    else:
        partition = "test"
    part_data = unsw_nb15_data[partition].astype(np.float32)
    part_data = torch.from_numpy(part_data)
    part_data_in = part_data[:, :-1]
    part_data_out = part_data[:, -1]
    return TensorDataset(part_data_in, part_data_out)

train_quantized_dataset = get_preqnt_dataset(".", True)
test_quantized_dataset = get_preqnt_dataset(".", False)

Get train Dataloader and test Dataloader

from torch.utils.data import DataLoader, Dataset

batch_size = 1000

# dataset loaders
train_quantized_loader = DataLoader(train_quantized_dataset, batch_size=batch_size, shuffle=True)
test_quantized_loader = DataLoader(test_quantized_dataset, batch_size=batch_size, shuffle=False)

Detect GPU or CPU

device = torch.device("cuda" if torch.cuda.is_available() else "cpu")

Define MLP model hyperparameter

input_size = 593
hidden1 = 64
hidden2 = 64
hidden3 = 64
weight_bit_width = 2
act_bit_width = 2
num_classes = 1

num_epochs = 10
lr = 0.001

Define quantized MLP model structure

from brevitas.nn import QuantLinear, QuantReLU
import torch.nn as nn

# Setting seeds for reproducibility
torch.manual_seed(0)

model = nn.Sequential(
      QuantLinear(input_size, hidden1, bias=True, weight_bit_width=weight_bit_width),
      nn.BatchNorm1d(hidden1),
      nn.Dropout(0.5),
      QuantReLU(bit_width=act_bit_width),
      QuantLinear(hidden1, hidden2, bias=True, weight_bit_width=weight_bit_width),
      nn.BatchNorm1d(hidden2),
      nn.Dropout(0.5),
      QuantReLU(bit_width=act_bit_width),
      QuantLinear(hidden2, hidden3, bias=True, weight_bit_width=weight_bit_width),
      nn.BatchNorm1d(hidden3),
      nn.Dropout(0.5),
      QuantReLU(bit_width=act_bit_width),
      QuantLinear(hidden3, num_classes, bias=True, weight_bit_width=weight_bit_width)
)

model.to(device)

Deffine model training method

def train(model, train_loader, optimizer, criterion):
    losses = []
    # ensure model is in training mode
    model.train()

    for i, data in enumerate(train_loader, 0):
        inputs, target = data
        inputs, target = inputs.to(device), target.to(device)
        optimizer.zero_grad()

        # forward pass
        output = model(inputs.float())
        loss = criterion(output, target.unsqueeze(1))

        # backward pass + run optimizer to update weights
        loss.backward()
        optimizer.step()

        # keep track of loss value
        losses.append(loss.data.cpu().numpy())

    return losses

Define model accuraccy calculation method

import torch
from sklearn.metrics import accuracy_score

def test(model, test_loader):
    # ensure model is in eval mode
    model.eval()
    y_true = []
    y_pred = []

    with torch.no_grad():
        for data in test_loader:
            inputs, target = data
            inputs, target = inputs.to(device), target.to(device)
            output_orig = model(inputs.float())
            # run the output through sigmoid
            output = torch.sigmoid(output_orig)
            # compare against a threshold of 0.5 to generate 0/1
            pred = (output.detach().cpu().numpy() > 0.5) * 1
            target = target.cpu().float()
            y_true.extend(target.tolist())
            y_pred.extend(pred.reshape(-1).tolist())

    return accuracy_score(y_true, y_pred)

Define loss funcction and optimizer

# loss criterion and optimizer
criterion = nn.BCEWithLogitsLoss().to(device)
optimizer = torch.optim.Adam(model.parameters(), lr=lr, betas=(0.9, 0.999))

Training model

import numpy as np
from sklearn.metrics import accuracy_score
from tqdm import tqdm, trange

# Setting seeds for reproducibility
torch.manual_seed(0)
np.random.seed(0)

t = trange(num_epochs, desc="Training loss", leave=True)

for epoch in t:
        loss_epoch = train(model, train_quantized_loader, optimizer,criterion)
        test_acc = test(model, test_quantized_loader)
        t.set_description("Training loss = %f test accuracy = %f" % (np.mean(loss_epoch), test_acc))
        t.refresh() # to show immediately the update

Test and save model state to train later

test(model, test_quantized_loader)

# Save the Brevitas model to disk
torch.save(model.state_dict(), "state_dict_self-trained.pth")

Prepare for Network Surgery

# Move the model to CPU before surgery
model = model.cpu()

Padding the input (593 -> 600) to make later FINN easier

from copy import deepcopy

modified_model = deepcopy(model)

W_orig = modified_model[0].weight.data.detach().numpy()
W_orig.shape
import numpy as np

# pad the second (593-sized) dimensions with 7 zeroes at the end
W_new = np.pad(W_orig, [(0,0), (0,7)])
W_new.shape
modified_model[0].weight.data = torch.from_numpy(W_new)
modified_model[0].weight.shape

Turn [0, 1] input into [-1, +1] input

from brevitas.nn import QuantIdentity


class CybSecMLPForExport(nn.Module):
    def __init__(self, my_pretrained_model):
        super(CybSecMLPForExport, self).__init__()
        self.pretrained = my_pretrained_model
        self.qnt_output = QuantIdentity(
            quant_type='binary',
            scaling_impl_type='const',
            bit_width=1, min_val=-1.0, max_val=1.0)

    def forward(self, x):
        # assume x contains bipolar {-1,1} elems
        # shift from {-1,1} -> {0,1} since that is the
        # input range for the trained network
        x = (x + torch.tensor([1.0]).to(x.device)) / 2.0
        out_original = self.pretrained(x)
        out_final = self.qnt_output(out_original)   # output as {-1,1}
        return out_final

model_for_export = CybSecMLPForExport(modified_model)
model_for_export.to(device)

Verify modified model's accuracy

def test_padded_bipolar(model, test_loader):
    # ensure model is in eval mode
    model.eval()
    y_true = []
    y_pred = []

    with torch.no_grad():
        for data in test_loader:
            inputs, target = data
            inputs, target = inputs.to(device), target.to(device)
            # pad inputs to 600 elements
            input_padded = torch.nn.functional.pad(inputs, (0,7,0,0))
            # convert inputs to {-1,+1}
            input_scaled = 2 * input_padded - 1
            # run the model
            output = model(input_scaled.float())
            y_pred.extend(list(output.flatten().cpu().numpy()))
            # make targets bipolar {-1,+1}
            expected = 2 * target.float() - 1
            expected = expected.cpu().numpy()
            y_true.extend(list(expected.flatten()))

    return accuracy_score(y_true, y_pred)

test_padded_bipolar(model_for_export, test_quantized_loader)

Export to FINN-ONNX

import brevitas.onnx as bo
from brevitas.quant_tensor import QuantTensor

ready_model_filename = "cybsec-mlp-ready.onnx"
input_shape = (1, 600)

# create a QuantTensor instance to mark input as bipolar during export
input_a = np.random.randint(0, 1, size=input_shape).astype(np.float32)
input_a = 2 * input_a - 1
scale = 1.0
input_t = torch.from_numpy(input_a * scale)
input_qt = QuantTensor(
    input_t, scale=torch.tensor(scale), bit_width=torch.tensor(1.0), signed=True
)

#Move to CPU before export
model_for_export.cpu()

# Export to ONNX
bo.export_finn_onnx(
    model_for_export, export_path=ready_model_filename, input_t=input_qt
)

print("Model saved to %s" % ready_model_filename)

View the Exported ONNX in Netron

from finn.util.visualization import showInNetron

showInNetron(ready_model_filename)

Verify Exported ONNX Model in FINN and tidy-up model

Refefrence: end2end_example/cybersecurity/2-import-into-finn-and-verify.ipynb. You can go there and run these command for yourself. Be sure to run the previous part to get the neccessary .onnx files. Then close and halt that notebook beccause Netron visualizations use the same port.

import onnx
import torch

Inport ONNX model into FINN

from qonnx.core.modelwrapper import ModelWrapper

ready_model_filename = "cybsec-mlp-ready.onnx"
model_for_sim = ModelWrapper(ready_model_filename)

Apply tidy-up graph transformation

from qonnx.transformation.general import GiveReadableTensorNames, GiveUniqueNodeNames, RemoveStaticGraphInputs
from qonnx.transformation.infer_shapes import InferShapes
from qonnx.transformation.infer_datatypes import InferDataTypes
from qonnx.transformation.fold_constants import FoldConstants

model_for_sim = model_for_sim.transform(InferShapes())
model_for_sim = model_for_sim.transform(FoldConstants())
model_for_sim = model_for_sim.transform(GiveUniqueNodeNames())
model_for_sim = model_for_sim.transform(GiveReadableTensorNames())
model_for_sim = model_for_sim.transform(InferDataTypes())
model_for_sim = model_for_sim.transform(RemoveStaticGraphInputs())

verif_model_filename = "cybsec-mlp-verification.onnx"
model_for_sim.save(verif_model_filename)

See model srtucture after transformation

from finn.util.visualization import showInNetron

showInNetron(verif_model_filename)

To verify the model after transformation. Load the dataset

import numpy as np
from torch.utils.data import TensorDataset

def get_preqnt_dataset(data_dir: str, train: bool):
    unsw_nb15_data = np.load(data_dir + "/unsw_nb15_binarized.npz")
    if train:
        partition = "train"
    else:
        partition = "test"
    part_data = unsw_nb15_data[partition].astype(np.float32)
    part_data = torch.from_numpy(part_data)
    part_data_in = part_data[:, :-1]
    part_data_out = part_data[:, -1]
    return TensorDataset(part_data_in, part_data_out)

n_verification_inputs = 100
test_quantized_dataset = get_preqnt_dataset(".", False)
input_tensor = test_quantized_dataset.tensors[0][:n_verification_inputs]
input_tensor.shape

Load the model before transformation

input_size = 593
hidden1 = 64
hidden2 = 64
hidden3 = 64
weight_bit_width = 2
act_bit_width = 2
num_classes = 1

from brevitas.nn import QuantLinear, QuantReLU
import torch.nn as nn

brevitas_model = nn.Sequential(
      QuantLinear(input_size, hidden1, bias=True, weight_bit_width=weight_bit_width),
      nn.BatchNorm1d(hidden1),
      nn.Dropout(0.5),
      QuantReLU(bit_width=act_bit_width),
      QuantLinear(hidden1, hidden2, bias=True, weight_bit_width=weight_bit_width),
      nn.BatchNorm1d(hidden2),
      nn.Dropout(0.5),
      QuantReLU(bit_width=act_bit_width),
      QuantLinear(hidden2, hidden3, bias=True, weight_bit_width=weight_bit_width),
      nn.BatchNorm1d(hidden3),
      nn.Dropout(0.5),
      QuantReLU(bit_width=act_bit_width),
      QuantLinear(hidden3, num_classes, bias=True, weight_bit_width=weight_bit_width)
)

# replace this with your trained network checkpoint if you're not
# using the pretrained weights
trained_state_dict = torch.load("state_dict.pth")["models_state_dict"][0]
# Uncomment the following line if you previously chose to train the network yourself
#trained_state_dict = torch.load("state_dict_self-trained.pth")

brevitas_model.load_state_dict(trained_state_dict, strict=False)

Adjust input for normal model

def inference_with_brevitas(current_inp):
    brevitas_output = brevitas_model.forward(current_inp)
    # apply sigmoid + threshold
    brevitas_output = torch.sigmoid(brevitas_output)
    brevitas_output = (brevitas_output.detach().numpy() > 0.5) * 1
    # convert output to bipolar
    brevitas_output = 2*brevitas_output - 1
    return brevitas_output

Adjust input ffor transformed model

import finn.core.onnx_exec as oxe

def inference_with_finn_onnx(current_inp):
    finnonnx_in_tensor_name = model_for_sim.graph.input[0].name
    finnonnx_model_in_shape = model_for_sim.get_tensor_shape(finnonnx_in_tensor_name)
    finnonnx_out_tensor_name = model_for_sim.graph.output[0].name
    # convert input to numpy for FINN
    current_inp = current_inp.detach().numpy()
    # add padding and re-scale to bipolar
    current_inp = np.pad(current_inp, [(0, 0), (0, 7)])
    current_inp = 2*current_inp-1
    # reshape to expected input (add 1 for batch dimension)
    current_inp = current_inp.reshape(finnonnx_model_in_shape)
    # create the input dictionary
    input_dict = {finnonnx_in_tensor_name : current_inp}
    # run with FINN's execute_onnx
    output_dict = oxe.execute_onnx(model_for_sim, input_dict)
    #get the output tensor
    finn_output = output_dict[finnonnx_out_tensor_name]
    return finn_output

Compare two models

import numpy as np
from tqdm import trange

verify_range = trange(n_verification_inputs, desc="FINN execution", position=0, leave=True)
brevitas_model.eval()

ok = 0
nok = 0

for i in verify_range:
    # run in Brevitas with PyTorch tensor
    current_inp = input_tensor[i].reshape((1, 593))
    brevitas_output = inference_with_brevitas(current_inp)
    finn_output = inference_with_finn_onnx(current_inp)
    # compare the outputs
    ok += 1 if finn_output == brevitas_output else 0
    nok += 1 if finn_output != brevitas_output else 0
    verify_range.set_description("ok %d nok %d" % (ok, nok))
    verify_range.refresh()

if ok == n_verification_inputs:
    print("Verification succeeded. Brevitas and FINN-ONNX execution outputs are identical")
else:
    print("Verification failed. Brevitas and FINN-ONNX execution outputs are NOT identical")

Building the Streaming Dataflow Accelerator

Refefrence: end2end_example/cybersecurity/3-build-accelerator-with-finn.ipynb. You can go there and run these command for yourself. Be sure to run the previous part to get the neccessary .onnx files. Then close and halt that notebook beccause Netron visualizations use the same port.

launch a build that only generates the estimate reports, which does not require any synthesis

import finn.builder.build_dataflow as build
import finn.builder.build_dataflow_config as build_cfg
import os
import shutil

model_file = "cybsec-mlp-ready.onnx"

estimates_output_dir = "output_estimates_only"

#Delete previous run results if exist
if os.path.exists(estimates_output_dir):
    shutil.rmtree(estimates_output_dir)
    print("Previous run results deleted!")


cfg_estimates = build.DataflowBuildConfig(
    output_dir          = estimates_output_dir,
    mvau_wwidth_max     = 80,
    target_fps          = 1000000,
    synth_clk_period_ns = 10.0,
    fpga_part           = "xc7z020clg400-1",
    steps               = build_cfg.estimate_only_dataflow_steps,
    generate_outputs=[
        build_cfg.DataflowOutputType.ESTIMATE_REPORTS,
    ]
)
%%time
build.build_dataflow_cfg(model_file, cfg_estimates)

Generated report will be in output_estimates_only/report directory

Generate the accelerator. This will take about 10 minutes because multiple calls to Vivado and a call to RTL simulation are involved.

import finn.builder.build_dataflow as build
import finn.builder.build_dataflow_config as build_cfg
import os
import shutil

model_file = "cybsec-mlp-ready.onnx"

rtlsim_output_dir = "output_ipstitch_ooc_rtlsim"

#Delete previous run results if exist
if os.path.exists(rtlsim_output_dir):
    shutil.rmtree(rtlsim_output_dir)
    print("Previous run results deleted!")

cfg_stitched_ip = build.DataflowBuildConfig(
    output_dir          = rtlsim_output_dir,
    mvau_wwidth_max     = 80,
    target_fps          = 1000000,
    synth_clk_period_ns = 10.0,
    fpga_part           = "xc7z020clg400-1",
    generate_outputs=[
        build_cfg.DataflowOutputType.STITCHED_IP,
        build_cfg.DataflowOutputType.RTLSIM_PERFORMANCE,
        build_cfg.DataflowOutputType.OOC_SYNTH,
    ]
)
%%time
build.build_dataflow_cfg(model_file, cfg_stitched_ip)

We will find the accelerator exported as a stitched IP block design in output_ipstitch_ooc_rtlsim/stitched_ip directory. And different reports in output_ipstitch_ooc_rtlsim/report directory.

Generate PYNQ bitfile and driver. This will take about 15 ~ 20 minutes.

import finn.builder.build_dataflow as build
import finn.builder.build_dataflow_config as build_cfg
import os
import shutil

model_file = "cybsec-mlp-ready.onnx"

final_output_dir = "output_final"

#Delete previous run results if exist
if os.path.exists(final_output_dir):
    shutil.rmtree(final_output_dir)
    print("Previous run results deleted!")

cfg = build.DataflowBuildConfig(
    output_dir          = final_output_dir,
    mvau_wwidth_max     = 80,
    target_fps          = 1000000,
    synth_clk_period_ns = 10.0,
    board               = "Pynq-Z2",
    shell_flow_type     = build_cfg.ShellFlowType.VIVADO_ZYNQ,
    generate_outputs=[
        build_cfg.DataflowOutputType.BITFILE,
        build_cfg.DataflowOutputType.PYNQ_DRIVER,
        build_cfg.DataflowOutputType.DEPLOYMENT_PACKAGE,
    ]
)
%%time
build.build_dataflow_cfg(model_file, cfg

Generated bitfile and .hwh file is loccated in output_final/bitfile directory. The generated Python driver lets us execute the accelerator on PYNQ platforms with simply numpy i/o. It's located in output_final/driver directory. Reports are in output_final/report directory. Finally, we have the output_final/deploy folder which contains everything you need to copy onto the target board to get the accelerator running.

To test the accelerator on the board, we'll put a copy of the dataset and a premade Python script that validates the accuracy into the output_final/deploy/driver folder, then make a zip archive of the whole deployment folder.

! cp unsw_nb15_binarized.npz {final_output_dir}/deploy/driver
! cp validate-unsw-nb15.py {final_output_dir}/deploy/driver
from shutil import make_archive
make_archive('deploy-on-pynq', 'zip', final_output_dir+"/deploy")

You can now download the created zipfile (File -> Open, mark the checkbox next to the deploy-on-pynq.zip and select Download from the toolbar), then copy it to your PYNQ board (for instance via scp or rsync). Then, run the following commands on the PYNQ board (for example, open a Terminal on PYNQ with jupyter) to extract the archive and run the validation:

unzip deploy-on-pynq.zip -d finn-cybsec-mlp-demo
cd finn-cybsec-mlp-demo/driver
sudo python3.6 validate-unsw-nb15.py --batchsize 1000

You should see Final accuracy: 91.868293 at the end.

To see details while running validation, the generated driver includes a benchmarking mode that shows the runtime breakdown:

sudo python3.6 driver.py --exec_mode throughput_test --bitfile ../bitfile/finn-accel.bit --batchsize 1000
cat nw_metrics.txt